16 research outputs found

    Data-Driven Methods for Data Center Operations Support

    Get PDF
    During the last decade, cloud technologies have been evolving at an impressive pace, such that we are now living in a cloud-native era where developers can leverage on an unprecedented landscape of (possibly managed) services for orchestration, compute, storage, load-balancing, monitoring, etc. The possibility to have on-demand access to a diverse set of configurable virtualized resources allows for building more elastic, flexible and highly-resilient distributed applications. Behind the scenes, cloud providers sustain the heavy burden of maintaining the underlying infrastructures, consisting in large-scale distributed systems, partitioned and replicated among many geographically dislocated data centers to guarantee scalability, robustness to failures, high availability and low latency. The larger the scale, the more cloud providers have to deal with complex interactions among the various components, such that monitoring, diagnosing and troubleshooting issues become incredibly daunting tasks. To keep up with these challenges, development and operations practices have undergone significant transformations, especially in terms of improving the automations that make releasing new software, and responding to unforeseen issues, faster and sustainable at scale. The resulting paradigm is nowadays referred to as DevOps. However, while such automations can be very sophisticated, traditional DevOps practices fundamentally rely on reactive mechanisms, that typically require careful manual tuning and supervision from human experts. To minimize the risk of outages—and the related costs—it is crucial to provide DevOps teams with suitable tools that can enable a proactive approach to data center operations. This work presents a comprehensive data-driven framework to address the most relevant problems that can be experienced in large-scale distributed cloud infrastructures. These environments are indeed characterized by a very large availability of diverse data, collected at each level of the stack, such as: time-series (e.g., physical host measurements, virtual machine or container metrics, networking components logs, application KPIs); graphs (e.g., network topologies, fault graphs reporting dependencies among hardware and software components, performance issues propagation networks); and text (e.g., source code, system logs, version control system history, code review feedbacks). Such data are also typically updated with relatively high frequency, and subject to distribution drifts caused by continuous configuration changes to the underlying infrastructure. In such a highly dynamic scenario, traditional model-driven approaches alone may be inadequate at capturing the complexity of the interactions among system components. DevOps teams would certainly benefit from having robust data-driven methods to support their decisions based on historical information. For instance, effective anomaly detection capabilities may also help in conducting more precise and efficient root-cause analysis. Also, leveraging on accurate forecasting and intelligent control strategies would improve resource management. Given their ability to deal with high-dimensional, complex data, Deep Learning-based methods are the most straightforward option for the realization of the aforementioned support tools. On the other hand, because of their complexity, this kind of models often requires huge processing power, and suitable hardware, to be operated effectively at scale. These aspects must be carefully addressed when applying such methods in the context of data center operations. Automated operations approaches must be dependable and cost-efficient, not to degrade the services they are built to improve. i

    A Tool for Aligning Event Logs and Prescriptive Process Models through Automated Planning

    Get PDF
    In Conformance Checking, alignment is the problem of detecting and repairing nonconformity between the actual execution of a business process, as recorded in an event log, and the model of the same process. Literature proposes solutions for the alignment problem that are implementations of planning algorithms built ad-hoc for the specific problem. Unfortunately, in the era of big data, these ad-hoc implementations do not scale sufficiently compared with well-established planning systems. In this paper, we tackle the above issue by presenting a tool, also available in ProM, to represent instances of the alignment problem as automated planning problems in PDDL (Planning Domain Definition Language) for which state-of-the-art planners can find a correct solution in a finite amount of time. If alignment problems are converted into planning problems, one can seamlessly update to the recent versions of the best performing automated planners, with advantages in term of versatility and customization. Furthermore, by employing several processes and event logs of different sizes, we show how our tool outperforms existing approaches of several order of magnitude and, in certain cases, carries out the task while existing approaches run out of memory

    Analyzing Declarative Deployment Code with Large Language Models

    Get PDF
    In the cloud-native era, developers have at their disposal an unprecedented landscape of services to build scalable distributed systems. The DevOps paradigm emerged as a response to the increasing necessity of better automations, capable of dealing with the complexity of modern cloud systems. For instance, Infrastructure-as-Code tools provide a declarative way to define, track, and automate changes to the infrastructure underlying a cloud application. Assuring the quality of this part of a code base is of utmost importance. However, learning to produce robust deployment specifications is not an easy feat, and for the domain experts it is time-consuming to conduct code-reviews and transfer the appropriate knowledge to novice members of the team. Given the abundance of data generated throughout the DevOps cycle, machine learning (ML) techniques seem a promising way to tackle this problem. In this work, we propose an approach based on Large Language Models to analyze declarative deployment code and automatically provide QA-related recommendations to developers, such that they can benefit of established best practices and design patterns. We developed a prototype of our proposed ML pipeline, and empirically evaluated our approach on a collection of Kubernetes manifests exported from a repository of internal projects at Nokia Bell Labs

    Self-supervised pre-training of CNNs for flatness defect classification in the steelworks industry

    Get PDF
    Classification of surface defects in the steelworks industry plays a significant role in guaranteeing the quality of the products. From an industrial point of view, a serious concern is represented by the hot-rolled products shape defects and particularly those concerning the strip flatness. Flatness defects are typically divided into four sub-classes depending on which part of the strip is affected and the corresponding shape. In the context of this research, the primary objective is evaluating the improvements of exploiting the self-supervised learning paradigm for defects classification, taking advantage of unlabelled, real, steel strip flatness maps. Different pre-training methods are compared, as well as architectures, taking advantage of well-established neural subnetworks, such as Residual and Inception modules. A systematic approach in evaluating the different performances guarantees a formal verification of the self-supervised pre-training paradigms evaluated hereafter. In particular, pre-training neural networks with the EgoMotion meta-algorithm shows classification improvements over the AutoEncoder technique, which in turn is better performing than a Glorot weight initialization

    Behavioral Analysis for Virtualized Network Functions : A SOM-based Approach

    Get PDF
    In this paper, we tackle the problem of detecting anomalous behaviors in a virtualized infrastructure for network function virtualization, proposing to use self-organizing maps for analyzing historical data available through a data center. We propose a joint analysis of system-level metrics, mostly related to resource consumption patterns of the hosted virtual machines, as available through the virtualized infrastructure monitoring system, and the application-level metrics published by individual virtualized network functions through their own monitoring subsystems. Experimental results, obtained by processing real data from one of the NFV data centers of the Vodafone network operator, show that our technique is able to identify specific points in space and time of the recent evolution of the monitored infrastructure that are worth to be investigated by a human operator in order to keep the system running under expected conditions

    SOM-based behavioral analysis for virtualized network functions

    Get PDF
    In this paper, we propose a mechanism based on Self-Organizing Maps for analyzing the resource consumption behaviors and detecting possible anomalies in data centers for Network Function Virtualization (NFV). Our approach is based on a joint analysis of two historical data sets available through two separate monitoring systems: system-level metrics for the physical and virtual machines obtained from the monitoring infrastructure, and application-level metrics available from the individual virtualized network functions. Experimental results, obtained by processing real data from one of the NFV data centers of the Vodafone network operator, highlight some of the capabilities of our system to identify interesting points in space and time of the evolution of the monitored infrastructure

    Forecasting Operation Metrics for Virtualized Network Functions

    Get PDF
    Network Function Virtualization (NFV) is the key technology that allows modern network operators to provide flexible and efficient services, by leveraging on general-purpose private cloud infrastructures. In this work, we investigate the performance of a number of metric forecasting techniques based on machine learning and artificial intelligence, and provide insights on how they can support the decisions of NFV operation teams. Our analysis focuses on both infrastructure-level and service-level metrics. The former can be fetched directly from the monitoring system of an NFV infrastructure, whereas the latter are typically provided by the monitoring components of the individual virtualized network functions. Our selected forecasting techniques are experimentally evaluated using real-life data, exported from a production environment deployed within some Vodafone NFV data centers. The results show what the compared techniques can achieve in terms of the forecasting accuracy and computational cost required to train them on production data

    No detection of methane on Mars from early ExoMars Trace Gas Orbiter observations

    Get PDF
    The detection of methane on Mars has been interpreted as indicating that geochemical or biotic activities could persist on Mars today. A number of different measurements of methane show evidence of transient, locally elevated methane concentrations and seasonal variations in background methane concentrations. These measurements, however, are difficult to reconcile with our current understanding of the chemistry and physics of the Martian atmosphere, which-given methane's lifetime of several centuries-predicts an even, well mixed distribution of methane. Here we report highly sensitive measurements of the atmosphere of Mars in an attempt to detect methane, using the ACS and NOMAD instruments onboard the ESA-Roscosmos ExoMars Trace Gas Orbiter from April to August 2018. We did not detect any methane over a range of latitudes in both hemispheres, obtaining an upper limit for methane of about 0.05 parts per billion by volume, which is 10 to 100 times lower than previously reported positive detections. We suggest that reconciliation between the present findings and the background methane concentrations found in the Gale crater would require an unknown process that can rapidly remove or sequester methane from the lower atmosphere before it spreads globally

    Martian dust storm impact on atmospheric H<sub>2</sub>O and D/H observed by ExoMars Trace Gas Orbiter

    Get PDF
    Global dust storms on Mars are rare but can affect the Martian atmosphere for several months. They can cause changes in atmospheric dynamics and inflation of the atmosphere, primarily owing to solar heating of the dust. In turn, changes in atmospheric dynamics can affect the distribution of atmospheric water vapour, with potential implications for the atmospheric photochemistry and climate on Mars. Recent observations of the water vapour abundance in the Martian atmosphere during dust storm conditions revealed a high-altitude increase in atmospheric water vapour that was more pronounced at high northern latitudes, as well as a decrease in the water column at low latitudes. Here we present concurrent, high-resolution measurements of dust, water and semiheavy water (HDO) at the onset of a global dust storm, obtained by the NOMAD and ACS instruments onboard the ExoMars Trace Gas Orbiter. We report the vertical distribution of the HDO/H O ratio (D/H) from the planetary boundary layer up to an altitude of 80 kilometres. Our findings suggest that before the onset of the dust storm, HDO abundances were reduced to levels below detectability at altitudes above 40 kilometres. This decrease in HDO coincided with the presence of water-ice clouds. During the storm, an increase in the abundance of H2O and HDO was observed at altitudes between 40 and 80 kilometres. We propose that these increased abundances may be the result of warmer temperatures during the dust storm causing stronger atmospheric circulation and preventing ice cloud formation, which may confine water vapour to lower altitudes through gravitational fall and subsequent sublimation of ice crystals. The observed changes in H2O and HDO abundance occurred within a few days during the development of the dust storm, suggesting a fast impact of dust storms on the Martian atmosphere

    Aligning partially-ordered process-execution traces and models using automated planning

    Get PDF
    Conformance checking is the problem of verifying if the actual executions of business processes, which are recorded by information systems in dedicated event logs, are compliant with a process model that encodes the process’ constraints. Within conformance checking, alignment-based techniques can exactly pinpoint where deviations are observed. Existing alignment-based techniques rely on the assumption of a perfect knowledge of the order with which process’ activities were executed in reality. However, experience shows that, due to logging errors and inaccuracies, it is not always possible to determine the exact order with which certain activities were executed. This paper illustrates an alignment-based technique where the perfect knowledge assumption of the execution’s order is removed. The technique transforms the problem of alignment-based conformance checking into a planning problem encoded in PDDL, for which planners can find a correct solution in a finite amount of time. We implemented the technique as a software tool that is integrated with state-of-the-art planners. To showcase its practical relevance and scalability, we report on experiments with a real-life case study and several synthetic ones of increasing complexity
    corecore